Skip to content

Conversation

@willibrandon
Copy link
Contributor

Overview

This pull request addresses the encoding issue reported in #35, where JSON-RPC messages printed in the terminal showed corrupted characters (e.g., Chinese characters displayed as question marks).

The problem stemmed from the stdio transport layer relying on the system default encoding (Windows-1252 on Windows)
instead of explicitly using UTF-8.

Changes

  • Encoding Fix: Updated both client and server transports to explicitly use UTF8Encoding (without BOM) for reading and writing:
    // Create streams with explicit UTF-8 encoding to ensure proper Unicode character handling
    // This is especially important for non-ASCII characters like Chinese text and emoji
    var utf8Encoding = new UTF8Encoding(false); // No BOM
    _stdInWriter = new StreamWriter(_process.StandardInput.BaseStream, utf8Encoding) { AutoFlush = true };
    _stdOutReader = new StreamReader(_process.StandardOutput.BaseStream, utf8Encoding);
  • Tests Added: Tests have been implemented to verify that both BMP Unicode characters (Chinese: "上下文伺服器") and non-BMP Unicode characters (emoji: 🔍🚀👍) are correctly preserved during transport.

Impact

This fix resolves the Unicode character corruption by ensuring that the transport layer uses consistent UTF-8 encoding, improving the reliability of message display in all locales. The changes maintain the existing API surface while enhancing support for international characters.

Next Steps

Please review the changes and let me know if any further modifications or additional tests are needed.

@willibrandon willibrandon mentioned this pull request Mar 23, 2025
willibrandon and others added 2 commits March 24, 2025 08:53
This change replaces the default system encoding with an explicit UTF8Encoding (without BOM)
for both client and server transports. This ensures proper handling of Unicode characters,
including Chinese characters and emoji.

- Use UTF8Encoding explicitly for StreamReader and StreamWriter.
- Add tests for Chinese characters ("上下文伺服器") and emoji (🔍🚀👍) to confirm the fix.

Fixes modelcontextprotocol#35.
@stephentoub stephentoub self-assigned this Mar 24, 2025
@eiriktsarpalis eiriktsarpalis linked an issue Mar 24, 2025 that may be closed by this pull request
@stephentoub stephentoub force-pushed the fix/utf8-stdio-encoding branch from 1162a06 to f47d99f Compare March 24, 2025 13:46
@stephentoub stephentoub merged commit 36d5019 into modelcontextprotocol:main Mar 24, 2025
9 of 13 checks passed
@stephentoub
Copy link
Contributor

Thanks, @willibrandon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Encoding issue

2 participants